AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Neural Information Processing SystemsFeb-15-2026, 17:43:42 GMT

Weakly Coupled Deep Q-Networks

"subagents," one for each subproblem, and then combine their solutions to establish

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Country:

Asia > Middle East > Jordan (0.04)
Oceania > New Zealand (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
(2 more...)

Industry:

Transportation > Ground > Road (0.94)
Transportation > Electric Vehicle (0.94)
Automobiles & Trucks (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Stolz, Roland, Eichelbeck, Michael, Althoff, Matthias

Improving Stochastic Action-Constrained Reinforcement Learning via Truncated Distributions

arXiv.org Artificial IntelligenceDec-1-2025

In reinforcement learning (RL), it is often advantageous to consider additional constraints on the action space to ensure safety or action relevance. Existing work on such action-constrained RL faces challenges regarding effective policy updates, computational efficiency, and predictable runtime. Recent work proposes to use truncated normal distributions for stochastic policy gradient methods. However, the computation of key characteristics, such as the entropy, log-probability, and their gradients, becomes intractable under complex constraints. Hence, prior work approximates these using the non-truncated distributions, which severely degrades performance. We argue that accurate estimation of these characteristics is crucial in the action-constrained RL setting, and propose efficient numerical approximations for them. We also provide an efficient sampling strategy for truncated policy distributions and validate our approach on three benchmark environments, which demonstrate significant performance improvements when using accurate estimations.

approximation, machine learning, reinforcement learning, (18 more...)

2511.22406

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Ofer Dekel, arthur flajolet, Nika Haghtalab, Patrick Jaillet

Online Learning with a Hint

Neural Information Processing SystemsNov-21-2025, 05:43:57 GMT

We study a variant of online linear optimization where the player receives a hint about the loss function at the beginning of each round. The hint is given in the form of a vector that is weakly correlated with the loss vector on that round.

algorithm, artificial intelligence, machine learning, (17 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Industry: Education > Educational Setting > Online (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.97)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Neural Information Processing SystemsOct-9-2025, 00:40:46 GMT

8912b4892064a4f08a0c04f92913c134-Supplemental-Conference.pdf

artificial intelligence, machine learning, subproblem, (16 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Neural Information Processing SystemsOct-9-2025, 00:40:43 GMT

8912b4892064a4f08a0c04f92913c134-Paper-Conference.pdf

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Country:

Asia > Middle East > Jordan (0.04)
Oceania > New Zealand (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Industry:

Transportation > Ground > Road (0.94)
Transportation > Electric Vehicle (0.94)
Automobiles & Trucks (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Ali, Ali Mohamed, Tirel, Luca, Hashim, Hashim A.

Novel Multi-Agent Action Masked Deep Reinforcement Learning for General Industrial Assembly Lines Balancing Problems

arXiv.org Artificial IntelligenceJul-23-2025

Personal use of this material is permitted. Abstract --Efficient planning of activities is essential for modern industrial assembly lines to uphold manufacturing standards, prevent project constraint violations, and achieve cost-effective operations. While exact solutions to such challenges can be obtained through Integer Programming (IP), the dependence of the search space on input parameters often makes IP computationally infeasible for large-scale scenarios. Heuristic methods, such as Genetic Algorithms, can also be applied, but they frequently produce suboptimal solutions in extensive cases. This paper introduces a novel mathematical model of a generic industrial assembly line formulated as a Markov Decision Process (MDP), without imposing assumptions on the type of assembly line a notable distinction from most existing models. The proposed model is employed to create a virtual environment for training Deep Reinforcement Learning (DRL) agents to optimize task and resource scheduling. T o enhance the efficiency of agent training, the paper proposes two innovative tools. The first is an action-masking technique, which ensures the agent selects only feasible actions, thereby reducing training time. The second is a multi-agent approach, where each workstation is managed by an individual agent, as a result, the state and action spaces were reduced. A centralized training framework with decentralized execution is adopted, offering a scalable learning architecture for optimizing industrial assembly lines. This framework allows the agents to learn offline and subsequently provide real-time solutions during operations by leveraging a neural network that maps the current factory state to the optimal action. The effectiveness of the proposed scheme is validated through numerical simulations, demonstrating significantly faster convergence to the optimal solution compared to a comparable model-based approach.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

doi: 10.1016/j.jai.2025.07.001

2507.16635

Country:

North America > Canada (0.28)
North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(3 more...)

Hung, Wei, Sun, Shao-Hua, Hsieh, Ping-Chun

Efficient Action-Constrained Reinforcement Learning via Acceptance-Rejection Method and Augmented MDPs

arXiv.org Artificial IntelligenceMar-17-2025

Action-constrained reinforcement learning (ACRL) is a generic framework for learning control policies with zero action constraint violation, which is required by various safety-critical and resource-constrained applications. The existing ACRL methods can typically achieve favorable constraint satisfaction but at the cost of either high computational burden incurred by the quadratic programs (QP) or increased architectural complexity due to the use of sophisticated generative models. In this paper, we propose a generic and computationally efficient framework that can adapt a standard unconstrained RL method to ACRL through two modifications: (i) To enforce the action constraints, we leverage the classic acceptance-rejection method, where we treat the unconstrained policy as the proposal distribution and derive a modified policy with feasible actions. (ii) To improve the acceptance rate of the proposal distribution, we construct an augmented two-objective Markov decision process (MDP), which include additional self-loop state transitions and a penalty signal for the rejected actions. This augmented MDP incentives the learned policy to stay close to the feasible action sets. Through extensive experiments in both robot control and resource allocation domains, we demonstrate that the proposed framework enjoys faster training progress, better constraint satisfaction, and a lower action inference time simultaneously than the state-of-the-art ACRL methods. We have made the source code publicly available to encourage further research in this direction.

constraint, machine learning, reinforcement learning, (18 more...)

2503.12932

Country: Asia > Taiwan > Taiwan Province > Taipei (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Theile, Mirco, Dirnberger, Lukas, Trumpp, Raphael, Caccamo, Marco, Sangiovanni-Vincentelli, Alberto L.

Action Mapping for Reinforcement Learning in Continuous Environments with Constraints

arXiv.org Artificial IntelligenceDec-5-2024

Deep reinforcement learning (DRL) has had success across various domains, but applying it to environments with constraints remains challenging due to poor sample efficiency and slow convergence. Recent literature explored incorporating model knowledge to mitigate these problems, particularly through the use of models that assess the feasibility of proposed actions. However, integrating feasibility models efficiently into DRL pipelines in environments with continuous action spaces is non-trivial. We propose a novel DRL training strategy utilizing action mapping that leverages feasibility models to streamline the learning process. By decoupling the learning of feasible actions from policy optimization, action mapping allows DRL agents to focus on selecting the optimal action from a reduced feasible action set. We demonstrate through experiments that action mapping significantly improves training performance in constrained environments with continuous action spaces, especially with imperfect feasibility models.

agent, feasibility model, feasible action, (14 more...)

2412.04327

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Transportation (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

McMahan, Jeremy, Zhu, Xiaojin

Anytime-Constrained Multi-Agent Reinforcement Learning

arXiv.org Artificial IntelligenceOct-31-2024

We introduce anytime constraints to the multi-agent setting with the corresponding solution concept being anytime-constrained equilibrium (ACE). Then, we present a comprehensive theory of anytime-constrained Markov games, which includes (1) a computational characterization of feasible policies, (2) a fixed-parameter tractable algorithm for computing ACE, and (3) a polynomial-time algorithm for approximately computing feasible ACE. Since computing a feasible policy is NP-hard even for two-player zero-sum games, our approximation guarantees are the best possible under worst-case analysis. We also develop the first theory of efficient computation for action-constrained Markov games, which may be of independent interest.

algorithm, constraint, feasible policy, (15 more...)